Section with comments and questions on the analysis and workflow - let’s mostly discuss here
We need to agree on names of positions in general and which fall into technical/non-technical groups.
I did some text mining and defined for now technical and non-technical positions manually looking at the resulting list without stop words. I decided to clasify as follows:
analy|special|engine|develop|technic|optimimanage|direct|writ|consult|coordinat|edito|market|sale|social|strateg|supervis!!!This is already a very good selection. I would remove engin and optimi as they are part of “Search Engine Optimization”. They can be part of both worlds, for example an “Search Engine Optimization Content Writer”. Note: May change in the future. Therefore, for now there is now need to report on key takeaways.
I am not sure if all of them are correct (but hopefully most) but e.g. “specialist” is a term I would have another look at. I searched descriptions of specialists and they read as if SEO specialist is a common term for a technical position. Correct?
–> Okay, I have included an adjusted version, keeping the first as well so Brian can decide (or we can go back to it later or simply remove it if it’s bs anyway).
I thought I make a doughnut chart (as we also discussed on the phone). However, the doughnut looked not that good and a bit too fancy - but on a cartesian coordinated these stacked bars looked very nice! So it’s a bit more fancy graph but I think that’s good for a bit of variety and attention-drawing - I hope you and Brian like it!
!!! Love the stacked bars - looks more professional and business-like. :)
With my new css/html skills it was an easy thing to change the boring colors and fonts to match backlinkos and our plot design ;)
!!! Great…
!!! I put the more advanced research questions in parentheses. Let´s focus on the basic questions for now. I want to get the basics right. In our last project, we (you) put a lot of effort into the large vs other domains and at the end he did not use them.
–> Argh, in my versio nthis was not included yet so I did all plots with regard to company info beside specialized tasks which is a bit difficult anyway (more detailed comments below for each section). Going to ingore questions in parentheses from now on.
We analyzed the data on job titles using text mining techniques. In a first step, we tokenize the job titles into single words and visualize their frequency. Stop words and words that appeared less than 7 times were removed to make the graph easier to grasp.
In a second step, we analyzed sequences of words in the job title. The sorted bar plot shows the most popular consecutive sequences of words (5 or more occurrences), colored by category.
We manually classified in technical and non-technical positions, removing all words that are no specific to any of the both categories:
analy|special|engine|develop|technic|optimimanage|direct|writ|consult|coordinat|edito|market|sale|social|strateg|supervisThe modified stacked bar plot shows the number of words found per job category and, additionally as another stacked bar next to it, the most common words per category (with labels for words that occured at least 20 times). The height of the stacks indicates as well the number, the width is arbitrary.
(Not sure if specialist is technical only. I looked it up and it sounds as “SEO specialist” is a very technical position in general.)
A bubble map showing company locations by city (excluding “states”, “worldwide” and “remote”).
Version with the Backlinko cyan as outline:
A chloropleth hexagonal map showing company locations by state (excluding “states”, “worldwide” and “remote”). Hexagonal tile maps are useful to remove the effect of area (i.e. our variable “job offers” is not realted to a state’s area).
Version with the Backlinko cyan as outline:
–> ??? What do we define as “more specialized tasks”?
We extracted from the job descriptiosn the required/desired degree:
B.Ba.|B.Sc.|BBa|BSc|BBA|BSC|BachelorsM.Ba.|M.Sc.|MBa|MSc|MBA|MSC|MastersPh.D.|PhD|DoctorateIn total we found 39 posisition mentioning Bachelors, 10 Masters and only one with Doctorate.
-> I’ve added this since it’s a low hanging fruit after the last section ;) -> 3 different versiomn to deal with poverplotting of the x-axis labels
–> ??? Can you come up with a list or should I try to extract common words? Don’t know yet how to determine those terms between skills and… yeah, what does mark the end of the skill section?
–> So I tokenized the description and removed stop words and numbers as well as manually non-sense/non-skill-related words. There might be more but if we keep it we can have a closer look I would say. Only later realized it has less priority. However, these wordclouds can be used in another context for sure anyway.
!!! For section 4. and 5.: To get the words we are looking for, it would be useful to see a simple wordcloud or a df which the tokenized words (single words and bigrams). That way, we can scann through the list and select htose that fit to job tasks, programming language, knowledge of popular tools, etc. What do you think? Open to other appraoches.
What programming languages are most often required?
What languages are most often required in combination (e.g. Html, CSS)